{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Abstract: Pose与Structure IO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "蛋白质是大型生物分子，它由一个或多个由α-氨基酸残基组成的长链条组成。α-氨基酸分子呈线性排列，相邻α-氨基酸残基的羧基和氨基通过肽键连接在一起。\n",
    "蛋白质的分子结构可划分为四级，以描述其不同的方面：\n",
    "- 蛋白质一级结构：组成蛋白质多肽链的线性氨基酸序列。\n",
    "- 蛋白质二级结构：依靠不同氨基酸之间的C=O和N-H基团间的氢键形成的稳定结构，主要为α螺旋和β折叠。\n",
    "- 蛋白质三级结构：通过多个二级结构元素在三维空间的排列所形成的一个蛋白质分子的三维结构。\n",
    "- 蛋白质四级结构：用于描述由不同多肽链（亚基）间相互作用形成具有功能的蛋白质复合物分子。\n",
    "\n",
    "![title](./img/pose.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个章节中，你即将会学习到以下的一些信息：\n",
    "- Pose & Structure IO\n",
    "- PDBinfo & Pose\n",
    "- Atom、Residue & ResidueType\n",
    "- Conformation & Protein Geometry\n",
    "- Pose operating"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 简介: Pose的组织构架\n",
    "因此如果要在计算机中建立一个蛋白质的结构模型，就清楚地描述每一个原子的信息。在Rosetta中，Pose是管理蛋白质信息的中心，可以描述蛋白质一到四级结构所有的信息。而且这些信息是分层管理的比如:\n",
    "\n",
    "- Conformation: 负责管理原子类型(AtomType)、氨基酸类型(ResidueType)、氨基酸的原子坐标(xyz)、氨基酸连接方式的定义(FoldTree/AtomTree)等，这部分构成了蛋白质构象的所有物理信息。(最重要)\n",
    "- Energy: 负责管理氨基酸直接的能量计算所需的信息(EnergyGraph/energies)\n",
    "- ConstraintSet: 负责管理原子间的约束信息(constraints)\n",
    "- DataCache: 负责管理用户自定义的信息\n",
    "\n",
    "分层式管理使得Pose的信息修改和更新变得容易。\n",
    "\n",
    "除此以外，还有些外部对象如PDBinfo也负责转换和储存Pose与PDB之间的信息变换。\n",
    "\n",
    "以下是一个Pose中的架构的示意图：\n",
    "![title](./img/PoseObject.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Pose中的PDB信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pose是通常是从PDB文件中衍生出来的，通常除了原子的坐标信息以外，PDB文件中含包含了许多额外的信息，而这些信息是储存在PDBinfo中。比如温度因子数据(bfactor)、晶体解析数据(crystinfo)、原子的占用率(occupancy)、Pose编号与PDB编号的转换以及Pose的序列信息等。**如果Pose中的氨基酸发生了插入和删除，那么这部分的信息就需要重新从Pose中进行更新**。以下列举PDBinfo的一些重要功能:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.3.1 Rosetta编号与PDB编号**\n",
    "在PDB编号中，氨基酸的编号是依赖于其所在的链，如1A，2A...120A，1B，2B...130B等。\n",
    "而在Pose类中，氨基酸的编号是忽略链的分隔，按照链的顺序，**均从1开始递增**，因此在Pose中的氨基酸和PDB编号的对应是棘手的，但是我们可以通过PDB_info这个类中的pdb2pose和pose2pdb来解决转换问题"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取PDB号为24A的氨基酸残基所在的Pose残基编号\n",
    "pose = pose_from_pdb('./data/4R80.pdb')\n",
    "pose_pdbinfo = pose.pdb_info()\n",
    "pose_number = pose_pdbinfo.pdb2pose('A', 24)\n",
    "print(pose_number)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Pose残基编号为24的氨基酸残基所在的PDB号\n",
    "pdb_number = pose_pdbinfo.pose2pdb(24)\n",
    "print(pdb_number)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取pose number为24的氨基酸残基所在的PDB链号\n",
    "pose_pdbinfo.chain(24)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.3.2 PDB中晶体解析信息的提取**\n",
    "除了基本的编号信息以外，一些晶体相关的信息也可以轻松进行提取:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 提取bfactor信息\n",
    "pose.pdb_info().bfactor(1,1)  # 返回温度因子信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取PDB的晶体信息\n",
    "crystinfo = pose.pdb_info().crystinfo()\n",
    "print(crystinfo)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取原子的occupancy\n",
    "pose.pdb_info().occupancy(1, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Pose中的结构信息\n",
    "PDB的信息可以通过PDBinfo获取，除此以外，Pose中还有大量的结构信息，我们可以轻松通过各类函数来获取更多的信息，以下将逐一地列举。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.4.1 一级与二级结构信息的提取"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pose中的基本信息，如残基数量，序列，FoldTree(后续讲述)\n",
    "print(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Tips: 除此以外，还有一些具体的方法可以获取更多的细节:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 返回pose中链的数目\n",
    "pose.num_chains()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Pose的序列信息的方法\n",
    "print(f'所有的氨基酸:{pose.sequence()}\\n')\n",
    "print(f'前5个氨基酸:{pose.sequence(1,5)}\\n')\n",
    "print(f'1号链的所有氨基酸:{pose.chain_sequence(1)}\\n')\n",
    "print(f'2号链的所有氨基酸:{pose.chain_sequence(2)}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Pose的氨基酸总数量方法\n",
    "pose.total_residue()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过DSSP获取二级结构信息\n",
    "from pyrosetta.rosetta.protocols.membrane import get_secstruct\n",
    "get_secstruct(pose)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.4.2 氨基酸信息\n",
    "Residue Object是Pose的重要组成部分，整个Pose的Conformation是由一个个具体的氨基酸的具体构象组成，每个氨基酸有一个单独的object来描述，其中记录了所有的氨基酸信息。\n",
    "同样，通过Pose类，我们可以轻松地访问每个氨基酸对象，并提取其中的信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取的第24个氨基酸残基对象(Residue objects)\n",
    "residue24 = pose.residue(24)\n",
    "residue24"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取残基的Rosetta残基名、单字母缩写、三字母缩写、标注名\n",
    "print(pose.residue(1).name())\n",
    "print(pose.residue(2).name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "Rosetta中N段和C段以及形成了二硫键的半胱氨酸等都是以\"被修饰\"的状态存在，因为这些末端或成环氨基酸与那些处于肽链中的氨基酸的原子数目不同，因此描述他们拓扑结构的params文件不能直接复用，因此在Rosetta中提出了Patch系统，给这些特殊的氨基酸打上补丁来描述他们的拓扑结构。</br>\n",
    "因此为了区分，他们的命名也带上了补丁字样，比如上述的例子中，1号氨基酸名称为PRO:NtermProteinFull，其中NtermProteinFull就是他的\"补丁名\"。而第二号位于肽链中的氨基酸就是正常的三字母缩写的丝氨酸。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取氨基酸其他的缩写信息\n",
    "print(pose.residue(1).name1())\n",
    "print(pose.residue(1).name3())\n",
    "print(pose.residue(1).annotated_name())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "氨基酸的名称不止一种，还可以通过residue中的一些属性获得单字母缩写、三字母缩写以及标注名。</br>\n",
    "以1号氨基酸为例，它的单字母缩写为P，三字母缩写为PRO，而标注名为单字母缩写外加Rosetta残基名。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 还可以直接打印这氨基酸的所有信息:\n",
    "print(residue24)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "使用print打印信息后，可以获取这个对象中所有的信息:</br>\n",
    "如: 氨基酸的类型为丝氨酸(SER), 骨架原子和侧链的原子组成、残基性质、以及每个单独原子的三维坐标信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过Residue Object还可以判断氨基酸化学性质\n",
    "print(pose.residue(5).is_polar())\n",
    "print(pose.residue(5).is_aromatic())\n",
    "print(pose.residue(5).is_charged())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "可见半胱氨酸残基并不是极性残基、不含有疏水芳香环、也不带电。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.4.3 原子信息\n",
    "每个Residue Object都是由Atom Object组成，其中记录了所有的原子基本属性信息:</br>\n",
    "如原子名、元素类型、原子的一些化学和物理属性等。（**但不包括坐标信息**）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取第24号残基的原子的信息, 可见24号氨基酸共有11个原子\n",
    "residue24.natoms()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取每个原子的信息:\n",
    "for atom_id in range(1, 11):\n",
    "    atom_name = residue24.atom_name(atom_id)  # 获取原子名称\n",
    "    print(f'atom_id is:{atom_id}, atom_name is: {atom_name}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "此处我们获取了氨基酸内部的原子编号以及原子的PDB原子名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取前3个原子的属性信息\n",
    "for atom_id in range(1, 4):\n",
    "    atom_type = residue24.atom_type(atom_id)  # 获取原子名称\n",
    "    print(atom_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "通过atom_type方法，我们可以获取每个原子的细节的信息，如原子的Rosetta类型、原子的元素名、范德华半径、Lazaridis Karplus溶剂化参数等"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 也可以通过原子名反向查找原子的残基内原子的ID编号:\n",
    "ca_id = residue24.atom_index('CA')\n",
    "N_id = residue24.atom_index('N')\n",
    "print(ca_id, N_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "24氨基酸中, N原子为1号原子, Cα为2号原子。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取原子的坐标的方式:\n",
    "x, y, z = residue24.xyz(\"CA\")\n",
    "print(f'x: {x}, y:{y}, z:{z}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过原子序号原子的坐标的方式:\n",
    "x, y, z = residue24.xyz(2)\n",
    "print(f'x: {x}, y:{y}, z:{z}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.4.4 蛋白质几何信息\n",
    "Pose中描述具体的蛋白质构象，除了氨基酸类型以外，更是由原子间的键长、键角，二面角等一系列的具体参数构成。Pose中的Conformation对象负责记录了这些连接信息。\n",
    "![title](./img/phipsiomega.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了定位原子的信息，首先需要构建atom identifier对象，相当于创建一个ID卡，让Rosetta知道我们指定的原子是位于哪个氨基酸中的。通过AtomID，提供残基号，原子号，就可以创建atom identifier对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取原子间的键长、键角信息前需要构建atom identifier objects\n",
    "from pyrosetta.rosetta.core.id import AtomID\n",
    "atom1 = AtomID(1, 24)  # 24号残基的第一个原子\n",
    "atom2 = AtomID(2, 24)  # 24号残基的第二个原子\n",
    "atom3 = AtomID(3, 24)  # 24号残基的第三个原子\n",
    "atom4 = AtomID(4, 24)  # 24号残基的第四个原子\n",
    "print(atom1, atom2, atom3, atom4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "知道原子的ID后，就可以轻松的通过conformation对象来获取键长、键角等数据了。一般在Rosetta中键长和键角都设定为**理想值**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过conformation层获取键长数据\n",
    "bond_length = pose.conformation().bond_length(atom1, atom2)\n",
    "\n",
    "# 通过conformation层获取键角数据(弧度)\n",
    "bond_angle = pose.conformation().bond_angle(atom1, atom2, atom3)\n",
    "\n",
    "print(f'原始键长:{bond_length}, 原始键角:{bond_angle}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过pose获取氨基酸的骨架二面角数据\n",
    "phi = pose.phi(24)\n",
    "psi = pose.psi(24)\n",
    "omega = pose.omega(24)\n",
    "print(f'原始phi角:{phi}, 原始psi角:{psi}, 原始omega角:{omega}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 Pose的操作\n",
    "Pose中储存了非常多的信息，同时还提供了接口可以让用户方便的对其中的信息进行修改（采样）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.5.1 Pose的创建和复制\n",
    "前几节中提及过，pose是一个容器，理所当然我们可以创建一个空的容器，里面什么都不放。</br>\n",
    "很多时候，我们需要创建空的Pose对象，便于保存当前pose的实例化状态，可作为可回溯点或初始状态，方便反复调用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过创建一个新的Pose\n",
    "from pyrosetta import Pose\n",
    "starting_pose = Pose()\n",
    "starting_pose2 = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此处通过两种方法，将已有的Pose信息放入新的容器里，一种是通过assign函数复制，一种是通过python赋值符号来赋值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 方法1：通过assign复制新的构象\n",
    "starting_pose.assign(pose)\n",
    "\n",
    "# 方法2：通过python的直接赋值符号\n",
    "starting_pose2 = pose\n",
    "\n",
    "print(starting_pose)\n",
    "print('\\n')\n",
    "print(starting_pose2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**结果解读:</br>\n",
    "可见，两种方式“看”起来里面都有了新的pose信息。但真的如此么？**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 尝试调整mypose中的24号氨基酸phi值:\n",
    "pose.set_phi(24, 170.0)\n",
    "\n",
    "# 查看对starting_pose以及starting_pose2的影响:\n",
    "print(f'starting_pose 24 residue phi:{starting_pose.phi(24)}')\n",
    "print(f'starting_pose2 24 residue phi:{starting_pose2.phi(24)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 结果解读:\n",
    "结果可见，starting_pose2是用过python直接赋值的Pose并没有复制，而只是pose的一个\"超链接\"符，并没有进行\"复制\"的操作。\n",
    "而通过assign的复制，原始的pose的任何调整都没有对starting_pose造成任何的影响，可见其独立性。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.5.2 链与氨基酸的增减操作\n",
    "除了对整体信息的迁移，用户还可以对Pose中的链以及氨基酸进行操作。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.2.1 链的切割处理**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "尽管pose的氨基酸编号是忽略多肽链的分隔的，但是pose中的链依然是根据多肽链的物理结构进行编号的，同理也是从1开始编号。</br>\n",
    "如一个蛋白中有2条链A和B，那么链编号结果即为1和2。其中A对应1号链，B对应2号链，与PDB的链顺序有关（当然A链的顺序如果在后面，那么B链就是1号链）。</br>\n",
    "Pose类中许多的方法可以很方便对链的增减进行操作, 以下2个举例进行说明:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将Pose按照链的数量进行切割\n",
    "pose_list = pose.split_by_chain()\n",
    "pose_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此处的pose_list中存放了2个数据，说明链已经被切割成2个独立的pose对象了。</br>\n",
    "通过python的索引，可以获得具体的pose:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取只含有第一个链的pose\n",
    "chain1_pose = pose.split_by_chain()[1]  # 直接切片索引链号。\n",
    "chain2_pose = pose.split_by_chain()[2]  # 直接切片索引链号。\n",
    "\n",
    "# check\n",
    "print(f'chain1_pose中的氨基酸总数:{chain1_pose.total_residue()}')\n",
    "print(f'chain2_pose中的氨基酸总数:{chain2_pose.total_residue()}')\n",
    "print(f'原始pose中的氨基酸总数:{pose.total_residue()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.2.2 链的合并处理**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了分隔操作，用户还可以通过一些简单的方式把链合并到一个pose中，此处使用append_pose_to_pose函数就可以达到目的。但需要注意，pose中的氨基酸、链的数量变化后，都需要对PDBinfo进行更新。否则PDBinfo的信息与Pose信息不对称。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 两条链的合并;\n",
    "# add binder to pose;\n",
    "from pyrosetta.rosetta.core.pose import append_pose_to_pose\n",
    "print(f'原始chain1_pose中的氨基酸总数:{chain1_pose.total_residue()}')\n",
    "append_pose_to_pose(chain1_pose, chain2_pose)\n",
    "chain1_pose.update_residue_neighbors()\n",
    "chain1_pose.update_pose_chains_from_pdb_chains()\n",
    "chain1_pose.conformation().detect_disulfides()\n",
    "\n",
    "# update pdbinfo; 别忘了更新pdbinfo;\n",
    "# 更新pdb_info; [别忘了]\n",
    "from pyrosetta.rosetta.core.pose import renumber_pdbinfo_based_on_conf_chains\n",
    "renumber_pdbinfo_based_on_conf_chains(pose)\n",
    "\n",
    "print(f'append之后的chain1_pose中的氨基酸总数:{chain1_pose.total_residue()}')\n",
    "chain1_pose.sequence()\n",
    "\n",
    "# 检查PDBinfo是否正确: Returns true if PDBInfo is obsolete and needs updating\n",
    "print(f'PDBinfo是否需要被更新:{pose.pdb_info().obsolete()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.3.3 氨基酸的删减操作**\n",
    "除了对链的合并之外，我们还可以对链中的氨基酸进行添加、删除的操作！具体的过程是用户需要创建一个独立的氨基酸(residue object)，并将这个氨基酸加载到现有的构像中。</br>\n",
    "加载的方式可以是前置后后置，根据使用的函数不同而定。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 在链的前端添加新的氨基酸或删除氨基酸\n",
    "from pyrosetta.rosetta.core.conformation import ResidueFactory\n",
    "from pyrosetta.rosetta.core.chemical import ChemicalManager\n",
    "\n",
    "print(f'原始氨基酸总数:{pose.total_residue()}')\n",
    "print(f'原始氨基酸序列:{pose.sequence()}\\n')\n",
    "\n",
    "# 向前添加氨基酸\n",
    "chm = ChemicalManager.get_instance()\n",
    "rts = chm.residue_type_set(\"fa_standard\")\n",
    "new_rsd = ResidueFactory.create_residue(rts.name_map('ALA')) # 创建一个residue object\n",
    "pose.prepend_polymer_residue_before_seqpos(new_rsd, 1, True)  # 在第一个氨基酸前添加一个ALA\n",
    "\n",
    "print(f'向前添加之后氨基酸总数:{pose.total_residue()}')\n",
    "print(f'向前添加之后氨基酸序列:{pose.sequence()}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 向后添加氨基酸\n",
    "last_residue = pose.total_residue()\n",
    "pose.append_polymer_residue_after_seqpos(new_rsd, last_residue, True)  # 在第一个氨基酸前添加一个ALA\n",
    "\n",
    "print(f'向后添加之后氨基酸总数:{pose.total_residue()}')\n",
    "print(f'向后添加之后氨基酸序列:{pose.sequence()}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除氨基酸\n",
    "pose.delete_polymer_residue(1)  # 删除第一个氨基酸\n",
    "\n",
    "print(f'删除之后氨基酸总数:{pose.total_residue()}')\n",
    "print(f'删除之后氨基酸序列:{pose.sequence()}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 还可以范围性的删除氨基酸\n",
    "pose.delete_residue_range_slow(1,5) # 删除第一个至第五个氨基酸\n",
    "\n",
    "print(f'删除之后氨基酸总数:{pose.total_residue()}')\n",
    "print(f'删除之后氨基酸序列:{pose.sequence()}\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.3.4 PBDinfo更新**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 更新pdb_info; [别忘了!]\n",
    "from pyrosetta.rosetta.core.pose import renumber_pdbinfo_based_on_conf_chains\n",
    "\n",
    "renumber_pdbinfo_based_on_conf_chains(pose)  # 更新PDBinfo.\n",
    "\n",
    "# 检查PDBinfo是否正确: Returns true if PDBInfo is obsolete and needs updating\n",
    "print(f'PDBinfo是否需要被更新:{pose.pdb_info().obsolete()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.5.3 构象的调整"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了对多肽链的氨基酸数量的调整，我们还可以通过Pose中的一些函数来调整蛋白质的具体构象，如主链的phi/psi角、化学键中的键长与键角数据等。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.3.1 化学键的数据调整**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改键长键角必须通过conformation层进行处理:\n",
    "print(f'原始键长:{bond_angle}, 原始键角:{bond_length}')\n",
    "\n",
    "pose.conformation().set_bond_angle(atom1, atom2, atom3, 0.66666 * 3.14)\n",
    "new_bond_angle = pose.conformation().bond_angle(atom1, atom2, atom3)\n",
    "\n",
    "pose.conformation().set_bond_length(atom1, atom2, 1.44)\n",
    "new_bond_length = pose.conformation().bond_length(atom1, atom2)\n",
    "\n",
    "print(f'新的键长:{new_bond_length}, 新的键角:{new_bond_angle}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改phi、psi、chi、omega角可以直接通过pose的函数:\n",
    "# 通过pose获取氨基酸的骨架二面角数据\n",
    "print(f'原始phi角:{pose.phi(24)}, 原始psi角:{pose.psi(24)}, 原始omega角:{pose.omega(24)}')\n",
    "pose.set_phi(24, 66.0)\n",
    "pose.set_psi(24, 55.0)\n",
    "pose.set_omega(24, 180.0)\n",
    "\n",
    "print(f'调整后phi角:{pose.phi(24)}, 调整后psi角:{pose.psi(24)}, 调整后omega角:{pose.omega(24)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.3.2 氨基酸类型的调整(突变)**\n",
    "除了具体的化学键数据的调整，在PyRosetta中进行氨基酸的类型调整也是很方便的"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 调整氨基酸的类型\n",
    "from pyrosetta.toolbox import mutate_residue\n",
    "print(f'原始氨基酸类型:{pose.residue(1).name()}')\n",
    "print('突变氨基酸中...')\n",
    "mutate_residue(pose, 1, 'A', 9.0)  # 1 代表氨基酸突变的pose编号，9.0代表对氨基酸附近9埃范围内的氨基酸进行侧链优化，适应新的突变。\n",
    "print(f'突变后氨基酸类型:{pose.residue(1).name()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### **1.5.3.3 原子坐标的修改**\n",
    "原子坐标的修改需要获取residue对象，并获取原子ID(atom identifier objects)。通过pose.set_xyz函数设定新的xyz坐标, 但用户一般不需要”显式“地修改原子坐标, 除非你明白这样操作的意义。</br>\n",
    "此处以创建一个镜像原子进行说明:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 原子坐标的修改（一般不需要这样操作）\n",
    "from pyrosetta.rosetta.numeric import xyzVector_double_t\n",
    "\n",
    "# 对第24个氨基酸的所有原子的x坐标乘上一个负号:\n",
    "residue24 = pose.residue(24)  # 获取residue对象\n",
    "for atom_id, atom in enumerate(residue24.atoms()):\n",
    "    x, y, z = atom.xyz()\n",
    "    print(f'坐标进行修改前信息: 原子号:{atom_id+1}, x:{x}, y:{y}, z:{z}')\n",
    "    \n",
    "    mirror_xyz = xyzVector_double_t(-x, y, z)  # 乘上负号.\n",
    "    atom_index = AtomID(atom_id+1, 24)   # 24号氨基酸的第x个原子的id\n",
    "    pose.set_xyz(atom_index, mirror_xyz) # 设置xyz坐标\n",
    "\n",
    "print('\\n')\n",
    "    \n",
    "for atom_id, atom in enumerate(residue24.atoms()):\n",
    "    x, y, z = atom.xyz()\n",
    "    print(f'坐标进行修改后信息:  原子号:{atom_id+1}, x:{x}, y:{y}, z:{z}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 Pose的能量\n",
    "如果现在我们已有一个Pose，我们想评估这个蛋白质的能量评分，可以直接通过创建一个打分函数并对Pose的能量进行计算，</br>\n",
    "再可以通过pose中的energies对象将氨基酸残基的One-body, Two-body的能量信息列出，达到残基级别能量\"分解\"的目的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.1 对结构进行能量计算\n",
    "create_score_function函数可以用于快速创建一个打分函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 创建标准打分函数\n",
    "from pyrosetta import create_score_function\n",
    "scorefxn = create_score_function('ref2015')\n",
    "\n",
    "# 对当前Pose中的构象进行能量计算\n",
    "weighted_total_score = scorefxn(pose)\n",
    "print(weighted_total_score)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.2 能量信息\n",
    "Rosetta的能量是加权后的能量，每个能量项有自己的权重，通过energies获取的是能量项的原始结果（unweighted）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取能量对象\n",
    "scores = pose.energies()\n",
    "\n",
    "# 获取1号残基的所有能量项的信息:\n",
    "print(scores.show(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rosetta中的Score项有许多，如fa_atr代表范德华吸引势力, fa_rep代表范德华排斥项, fa_elec代表静电项等。</br>\n",
    "比如通过ScoreType类下的属性，即可搜索到对应的能量项。</br>\n",
    "获取总能中的某一个项的值可以直接使用python的索引功能，十分方便:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取总fa_atr项能量项的得分结果:\n",
    "from pyrosetta.rosetta.core.scoring import ScoreType\n",
    "pose.energies().total_energies()[ScoreType.fa_atr]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 单独获得第5号氨基酸的fa_atr项能量得分:\n",
    "pose.energies().residue_total_energies(5)[ScoreType.fa_atr]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.7 自定义信息\n",
    "Pose中含有让用户自定义写入任何信息的功能，比如在程序设计过程中，中间生成的临时数值或字符都可以写入到PoseExtraScore中，这些信息会随着Pose一并输出到PDB或则Silent文件中，在后续的分析和处理的过程中非常方便。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 给pose加入额外的信息: 比如filter计算的值就可以储存.\n",
    "from pyrosetta.rosetta.core.pose import setPoseExtraScore, getPoseExtraScore\n",
    "\n",
    "setPoseExtraScore(pose, \"distance\", 1.0)\n",
    "setPoseExtraScore(pose, \"angle\", 120.5)\n",
    "\n",
    "# 提取信息\n",
    "print(getPoseExtraScore(pose, 'distance'))\n",
    "print(getPoseExtraScore(pose, 'angle'))  # 目前有bug，但是信息已经储存在pose中了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pyrosetta",
   "language": "python",
   "name": "pyrosetta"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
